RaPiDS: an algorithm for rapid expression profile database search.

نویسندگان

  • Paul B Horton
  • Larisa Kiseleva
  • Wataru Fujibuchi
چکیده

In this paper we present a fast algorithm and implementation for computing the Spearman rank correlation (SRC) between a query expression profile and each expression profile in a database of profiles. The algorithm is linear in the size of the profile database with a very small constant factor. It is designed to efficiently handle multiple profile platforms and missing values. We show that our specialized algorithm and C++ implementation can achieve an approximately 100-fold speed-up over a reasonable baseline implementation using Perl hash tables. RaPiDS is designed for general similarity search rather than classification - but in order to attempt to classify the usefulness of SRC as a similarity measure we investigate the usefulness of this program as a classifier for classifying normal human cell types based on gene expression. Specifically we use the k nearest neighbor classifier with a t statistic derived from SRC as the similarity measure for profile pairs. We estimate the accuracy using a jackknife test on the microarray data with manually checked cell type annotation. Preliminary results suggest the measure is useful (64% accuracy on 1,685 profiles vs. the majority class classifier's 17.5%) for profiles measured under similar conditions (same laboratory and chip platform); but requires improvement when comparing profiles from different experimental series.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CellMontage: Similar Expression Profile Search Server

The establishment and rapid expansion of microarray databases has created a need for new search tools. Here we present CellMontage, the first server for expression profile similarity search over a large database-69 000 microarray experiments derived from NCBI's; GEO site. CellMontage provides a novel, content-based search engine for accessing gene expression data. Microarray experiments with si...

متن کامل

An Effective Path-aware Approach for Keyword Search over Data Graphs

Abstract—Keyword Search is known as a user-friendly alternative for structured languages to retrieve information from graph-structured data. Efficient retrieving of relevant answers to a keyword query and effective ranking of these answers according to their relevance are two main challenges in the keyword search over graph-structured data. In this paper, a novel scoring function is proposed, w...

متن کامل

AN OPTIMIZED NEURO-FUZZY GROUP METHOD OF DATA HANDLING SYSTEM BASED ON GRAVITATIONAL SEARCH ALGORITHM FOR EVALUATION OF LATERAL GROUND DISPLACEMENTS

During an earthquake, significant damage can result due to instability of the soil in the area affected by internal seismic waves. A liquefaction-induced lateral ground displacement has been a very damaging type of ground failure during past strong earthquakes. In this study, neuro-fuzzy group method of data handling (NF-GMDH) is utilized for assessment of lateral displacement in both ground sl...

متن کامل

Parameters Assignment of Electric Train Controller by Using Gravitational Search Optimization Algorithm

The speed profile of the train will be determined according to criteria such as safety, travel convenience, and the type of electric motor used for traction. Due to the passengers and cargo on the train, the electric train load is constantly changing. This will require reassigning the speed controller’s parameters of the electric train. For this purpose, the Gravitational Search optimization Al...

متن کامل

A graph search algorithm: Optimal placement of passive harmonic filters in a power system

The harmonic in distribution systems becomes an important problem due to an increase in nonlinear loads. This paper presents a new approach based on a graph algorithm for optimum placement of passive harmonic filters in a multi-bus system, which suffers from harmonic current sources. The objective of this paper is to minimize the network loss, the cost of the filter and the total harmonic disto...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Genome informatics. International Conference on Genome Informatics

دوره 17 2  شماره 

صفحات  -

تاریخ انتشار 2006